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ABSTRACT 



Classroom teachers need to ma.ke sound judg 
decisions concerning curricular and instructional issues- 
-teachers who wish to jDecome more effective in the class€Ot) 
learn to develop their own research designs since educatio 
research reportea in ^journals is often inconclusive, confl 
not relevant to the classroo^m teachers. Experimental studi 
particularly helpful to teachers if th^y inves.tigate^ the e 
variables such as textbooks, tests, and 'homework assi'gnmen 
cr more other variables such^s student knowledge, student 
and length of 'time required to complete a -test or homework 
assignment. Fac-j:ors to be considered when designing an esc^ 
study include experimental, validity (comparing and contfas 
results with results from a control gifoup or situation) , i 
validity (extent to which the observed effect 'appears to b 
one*s experimentation), and external Validity (determinati 
persons and circumstances to which the results apply)- The 
is that teachers ccLn do. valid classroom research if they c 
jsimp^le descriptive statistics with common sense and eicerci 
maintaining experimental validity when they gather informa 
(DB) 



meats and 
The^ 

m should 
nal * 

icting, Dr 
es can be 
ffects o£ 
ts on one 
atcicud^s/ 

erimental 
ting 
nternal 
e due CO 
on of 

conclusign 
ombine 
se care in- 
tion# 



#♦♦♦♦*♦♦♦♦♦********** ♦***♦***♦**«♦♦♦♦♦♦♦♦ ♦♦♦♦♦♦♦♦♦♦ 

♦ Reproductions supplied by EDRS are the best that can be made ♦ 

♦ from the original document. ♦ 

:#**************************3ft 3fl|5^c«** ************** 



U I OlFAKTMf NTOF HIALTH. 
I^UtATION A^WtflFARI 
NATIONAL INITlTUTn OF 
■ OUCATION 

THIS DOCUMENT HAS *SCEN REPRO* 
>DUCID IXACTLY AS RCCCIVCD FROM 
THI FCRSON OR ORGANIZATION ORIGIN"- 
ATING IT POINTSOF VIEWOR OPINIONS 
STATED DO NOT NCCESSARILV REPRE* 
SINT OFFICIAL NATIONAL INSTITUTE OF 
IDUCAT'ON POSITION OR POLICY 



DESIGN CONSIDERATIQNS 
FOR CLASSROOM RESEARCH* 



"PEWVIISSION TO REPRODUCE THIS 
MATEl^lAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) AND 
USERS OF THE ERIC SYSTEM." ^ • 



James P. Shaver 
Utah State^UnlveVsity 
Logan, Utah 



*Paper prepared ftrr a section meetlTig, ''Teache-r 
Research In the Classroom: How to Do ^t'*', annual 
meeting of the National Council for the Social 
Stpdles, Houston, NoVembet 23, 1978."^ 



. . • DESIGN CONSIDERATIONS ' . ^ . . J>Hj\J J 9 

FOR CLASSROOM RESEARCH'^f 

James P. Shaver ■ / ' ' ^ 

Utah State University > 

Logan, Utah . \ * 

Why should a classroom teacher do research? Realistically, not .with the/ 
. intent of making a ^contribution to Scientific (with a capital S) knowledge-.-- 
, the systematic Explanations of phenomena that are labeled ''theory**," Teachers' 
. other professional interests legitimately consume so much of .their time ^nd en- 
ergy that little of-^ either is left over for conceptualizing scientific ^ studies 
and seeking the resources tb carry them out. However', teachers are instruc- 
tional decision-nrakers for whom systematic data can be of considerable assist-r 
ance. Much relevant information is not available^, unless teachers gather it 
through their .own efforts. The findings of educational reseay^h reported dn 
^ journals are top often inconclusive, conflicting, or not pertlLient to\the, 
/matters that cdrice^rij^teachers either in building instructional' programs, or in 
interacting with students dayrby-day. (The. lack of f ruitf ulness of research 
in social studies education has been most receatly doeiumented by Wiley, 1977.) 
^Teachers who w^nt data as a basis for decision-making will have to produce 
much of it themselves. To do so scientifically (with a lx)wer case s); sy^teiii^ . 
atically and objectively, is possible, even within the constraints '.of ^the 
teaching situation. ^ , ' • , > . 

- # • . 

J. • 

Variations Qf experimental studies— i. e . , studies .In Whic^ some variable 
(e.g., the textbook, the quantity or quality of resource ma terials types of 

. film, types of homework assignments, types of items^^on tests) is* jtnv estiva ted 
in order to determine its effect o^ one o*r more other Variablesl (e.g., student 

^knowledge, stude^'nt attitudes, proportion of completed homework, length of time ^ 
tp cpmplete a test) — hold the most promise for 'teachers who are interested in 

. improving their instructional effectiveness. , In designing such studies, there ' 
are some coiranon sense notions — educational researche;rs talk ^bout them usirlg^ - 
rather iiechriical language — which can be helpful introducing ^al id resuflts. 

; . - ' E^^perimental Validity ^ ^' . * 

" , Experimentation as a means of gathering information depends on,- among other" - 
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^The first type of variabilis, typically called an independent variable , or 
a tVea^ment variable . * The second is called a dependentyvariable , because the 
researcher wants to know if yaliies on it (e.g., number o>f correct answers on a 



test^ are dependen-t upon th^ treatment variable. ^ 
i» ' ' 

2 • 

The^ fo;io\;^ing discussion relies heavily on a classic analysis of research ' 
by Donald T. 'Campbell and Julian C. Stanley, first published in the Handbook of 
research-on teaching, (edited by N.L, Gage^ and published by Rand McNally, 1963) as 
Chap, a; "Experimental and quasi-experimental designs £or' research'on teaching".- 
The chapter has-been r^eprinted as ♦a s'eparat|p|japerback book by Rand ^fffeNally with 
the shorter titje. Experimental and quasi-experimental designs fpr 'research. 




things,, being able to make comparisons and contrasts. Imagine,; for example, 
a teacher teaching an ""Energy and "the Environment" Unit for the first rtime. 
She wants to know whether student knowledge about alternative sources of fuel ^ " 
is greater as a result of the unit. S)ne teaches the unit to onerof her classes 
and giV€?s -them a final exam. The -teacher has used what is of^en referred to 
as. the One^^feroujo, pos_ ttes_t .design. She/ knows the students' scores on the final 
exam; bat, without some basis for comparison , ^ she does not know if her students' 
scores are] any, or much, different than they would have been without the unit. 
(She may-,^ of course,- mako an "intuitive" comparison — such as to the information 
Indicated' by the students' comments prior to studying the unit- Such observa- 
tions are important sources of information, but they are fraught with opportun- 
ities fpr invalid conclusions.) _ 

To improve her design, the teacher might decide to obtain another set of 
scores from her students to compare against the final exam scores. For. example, 
she cc'JIcI administer a test, to her students before the unit and then compare 
scores ori the final exam to those pretest scorers. This would be what is termed 
a 2n^:::S££u£» somewhat better than the^ one-group , 

posttest design, ^ut not m\i'k\\. To understand why, it is helpful to consider 
some of those common sense notions that educational researchers have about /'de- 
signing studies to get valid results — i.e., so as to have experimental validity . 

Experimental validity is commonly considered to have two aspects — internal 
validity and external validity. Internal validity has to do with the extent 
to which you can assume that any observed effect <ei*^g^, gains on the test pf 
knowledge of alternative fuel sources) is due to your treatment; or, put con- . . 
verseiy, the extent to which you can assume that the^ treatment ' s effect has 
been observed (it might he, for' example, that students learned from the unit 
on energy, but for one roason o^ another the learning was not reflected in the 
test scorers. 0 E2<ternal vajj^^^ ha? to do with the extept to which^ you can 
generalize Xpwr finding(s) 'beyond' ^he particular study from which tfhey were 
.^obtained. (E.g., if the stlidents gained in their knowlerdge of fuel sources, 
• can the teacher c^ssume that^ she will' attain the sama results using >the unit 
with future cls^es? ) ^Internal v^Xidity is the'more important of the two,- ^ 
.because unless^l^e can be fairly Qei;tain that his or her treatment h^s ha^ the 
d^sir/cl'*e'f feet ^ It is meaningvLe^s to ask with whom o.r under what conditions . the 
, effect Can j^e expected to occUr-. 

■/ ' ■ ^ 

' Jlvr^a^sto^^rit^^ Beings aware of several common threats to 

internal 'validity can help, teachers to design studies that will provide more, 
reliable infq^ation for tsheir decision-making purposes. 'The one-group, pre- 
test-pos ttes t design / mentioned above, is subject to most of the threats. that 
need to be considerecj. Let us assume that the'teacher using the Energy and 
Environment Unit oktained as large a gain in scores from the pretest to the 
posttest as she had hoped for. \^at might accoiriit for the gain ot^er than the 
treatme^^t (the unit)? ^ * . • . ' . 

One threat to the interri^l validity of Jier result is testing . It may be 
^ that taking t,he pretest .affected the stude'nts in some way (they learned 



the names of various fuels from reading the multiple-choice items, or 
taking the ^est alerted them to news items they wouldn't have^ noticed 
otherwise, or the rest. piqued their interest so that they sought read- 
ings about fuels outside of class), so that what looked like a gain di^e 
to the unit wa5 really du_e to taking the test. Also, if a test is not 
Valid — i.e., if it does not measure the teacher's instructional goals, 
treatment effectiveness, or lack of it, may not be detected. " 

Aaother threat to interpal validity is -what researchers call history .' • 
This 'term is used technically to refer to experiefices ^ther than the 
treatm^ent variable that tfie students might have betV7een the t^me a - / 
treatment starts and the time it ends. (Social studies teachers are 
used to thinking of history as what has happened in the past, e.g., what 
.had happened to the students ]beJ^or_c a new unit is taught.* Those'prior' 
experience'd^ may be a threat to valic|ity, but -researchers talk about 
that under the term "selection", whicli we will discuss shortly.) For 
example^,- many of the students in the Energy Unit teacher's class ma^ • 
have watched a TV special on energy durilng the duration of the unit, with 
that experience accounting ' for their better scores on the final exam. ' ' 

- ' 

Another^possibl^afhreat to internal validity is instrument decay . That 
is, changes irl^\^ test (e.g., if different pretest and posttests were used) 
or in scoring the ^est migh.t produce a change in scores. The latter 
source of instrument decay is a particularly likely threat. If a teacher 
knows which are the pretests and which. are the posttests, she or he may, 
consciously or unconsciously tend to score them differently. Fot ex- 
aflipl'e, an expectation that students will do better on the posttest ^ 
might affect' one's scoring judgments- Scoring "blind" (without know- 
'ing which ar*e the pre- and which the postt'ests) is a good idea. 

Other threats to internal validity are not likely to have affected th^ 
results of thfe Energy Unit Research. Nevertheless, they bear mention because of 
their applicaU^l ity to other- research studies that teachers might want to do. 

One such threat is mat uration . That is, somejeimes an observed effect is 
due to chclnges irf tlie students , that occur as/a function of the passing 
bf time. For example, if Piaget is cdYrect^ we can expect children to 
/tnove froiiu.the preoperationa'l stage of thinking to' the 'concrete operation- 
-al'stage at about age 'seven. Imagine -a teacher who throughout the 3^ear 
uses a set .of exerc*ises with her second gradfe clas^, hoping tb incre'ase 
' the^ students ' ability to think an concrete operational terms. To det^r- 
>^nl ef fectivenes-s, she uses the one-group, pcptest-posttest design. She 
' finclls a ^large'^ average gain^in scores; but, thi gain nii^ht be due only to 
J noj^kf maturation rather than^to*her exercises. 

Matftratii?n can* have a deleterious effect, too.. Fatigue or hunger ' 
are considered jnaturation processes." If, for exar^le, the Energy and ^ 



.Envirbriment Unit 'was taught in the ferte" af tl&rnoon t*when students were 
■ . ' fatigued, that iatigu^ -might counter any positive, effects of the unit. 

' ' ' ^ ■■■■ ' ' 

Another threat, to internal Validity is ttiat of statistical regression . 
' ' This threat has' a rather complex statistical explanation, "but 'it also can- 
be explained in^ a fairly common sense way. Statistically, we say that 
students who had extreme scores on a pretest .will have scores closer to ^ 
the group mean arithmetic average) on the posttest, even wirFft>u.t 

^ treatment. Let us 'say, for exiample, that our Energy Unit teacher is es- 

pecially interest^eH in helping '-'slow'* students do better. So she ad- 
ministers h^r pretei5t and later ^elects'^f or analysis those students' who 
^' * did most i^orly on it (perhdps those who had the bottom ten percent of 

the scoi?^). She compares the mean pretest score fef this selj^ctbd group 
with pes itiean posttest score. She would likely find a gain, because we 
" would expect t;he scores of the students to ^ove toward ^'the'^ group mean — 
i.e.,^to be^higher in this case. That expectation can,' as I/mentioned , ' 
" be explained statistically. But that involves getting into such matters 
as normal probabilitrST^istributions,. . On a more common sense^ level, we .can 
think of the studeats Who got the lowest ten percent of scores as probably 
knowing less than many of the other students, but also as likely to have ^ 
had ''ba'd luck^'^'on the .f irst -taking of the test — they* 'guessed poorly or 
happeaed to be especially fatigued or enjotionally upset for some reason, > 
On the posttest," they dre likely to have better *'luck*' while other students 
have "bad luck". So, fjhile the selected group qf students will still 
have low scores their scores will tend to be soqiewhat better than on 
the pretest (evfen if tbey had not-been exposed to th^ energy unit).. They ^ 
; ■ , will have mdved toward* the mean — ^nd other *'low knowledge" students who 

. had "bad luck" on the second testing will have even lower scores. ' 

Note that ^e tegr-ession effect works a.t/'bot]i ends of the distribu- 
. \ t,ion. If ' the teacher had picked the'ten percent of students with the high^ 
est scores on the pretest, their posttest, scores would likely have gone 
* ^down.' She might have concluded that the treatment was not effective with 
"bright" students. But the notion of "good luck" is as applicable to 
students who would, do Well anyway as the notion of' "bad luck" is to ^tu- ' 
dents".. who do poorly. Just as some students w.ould be in the bottoja ten 
' ■ * percent because of "b^d liick", some jwould be in. the top. ten percent be- 

' cau^e Of V-good luck", and their -scores would "be, likely to move toward the , 

mean on the posttest. . ^ \ ' . — . ^ ' 

This is, of cGlirse, a much oVer-s impl if led discussion of statistical ' 
^ regression. Tffe major message 'is, however; Be careful when cdmpaifiilg pre- 

and posttest scores for 'students selected because of extreme pretest (oi;^ 
' ' tother, sueh'a3 IQ or social aci j uSt?ment ) scores. tt may appear- that your 

--/treatment had an effect whe'n^ there was none; or an effect that ^did occur . 
I may be obscured. . • • - . ^ 

' ' ^'^^ ' . . ' ' * 4 * - V 

»' . . 

■ ' - .' * 

Two More T hreats and Design Considerations . What to do about J:hese threats? 
The researther^ s answer has been to add one or mo.re comparison groups to the de-. 
• ' sign.. These" '^r^ of ten called control groups , although tHpt can be a misnomer^ as 
it su'ggests t;hat "nothing happens ' to them, ^jfhen in realit}^^ they Usually receive • 
/ ■ some^^lternative treatment. ^ 1^-- . 
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One .'\comt)arisoa group" desijL;n that is 
to discuss two more threats to interna 
dosign .. \In this design^ two "natural" 
the treatment of interest ,. the other o 
to administer n pretest. Fot example, 
merVt scores of a group of stLdemts who 
political participation project might 
students in "the scliool who took a soci 



used on occasion provide^, an opportunity 
1 validity. It is calle,d the static group 
groups are compared^-one of which has had 
f which has not — but with no opportuhlty 
the end-of-the-year standardized achieve-' 
were part of a class that included a 
be compared against the scores of other 
al studies course without such a project. 
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Lack of control overl h9w students got in the project class might result 
in d ifferent ial select i on., a threat to- internal validity. If, for ex- 

, ample, students were free to choc/se which class they ^enrolled in, their 
reasons for enr'olling in the .participation class or the other might be 
based o,n factors related to achievement .test performance, such as intere 
in social studies. There would be no way to' establish that 'the gJoup^^ 
are equivalent, for in whatever other ways they may be similar, the 
students are different in one crucial regard — one group signed up .f or^ « 
the project class, the other didn't. Effects might b^ due to the" treat 
ment (the participation, pro j ec t )•. or ' to the initial differences in tl^e 

*grQups. \ 




Sometimes experimental mortality is a' threat to internal validity, tap." 
^Here rhortality is not used in the se^Q^.,of stucjents" or teachers dylifig^ 
but in the sense tliat there may be a-, deferential l,oss of students' from / 
the groups compared. For exampJ|e / 'j^^^S^ did not know abput ^he 

V political participation pro j ec t ©G^Ofejfe^ and were allowed ' ta drof> ^he 
class if they wished ' af te'r € ind iiig '^^'t.^^^ it, that '^Vould bos expefe- 
mental morta-lity. 'Jus-t'* as With J:fie xitred\^krff selection, mortality means 
that the groups maybe differen^t in ampolr-ffant: ways (t^ie importance depends, 
of course, an Viowmanv student^VSrc^ •'^^Snd on ' hbjt^ -ciif f §rent they are from 
-those who^stay) which are of ten ^nearly^^impo^l^ible^-to determine with any 
precision. l^at appears . to j-b^« a' dif f eretnif&'^due'''tp treatment may' simply , 
be the result o'V losing students-." • - 

r. ' .. . A.- ' 

\>fhere possible, resel\«hera wuld like to feelect their. own treatment and 
control (qr alternative t^ea-tment) groups. . In particular, , the best procedure 
is ooe that en/ures that stud^eiits are randomly selected to the groups. This_ 
could be done by pulling names out of a hat~ a^ well as by thel-statistically 
sophisticated use pf a table of random numbers-. Any procedure'^'that ^ensures 
that each person has an equal chance of being chosen for each oof. the groups^pfS , 
satisfactory. Random sel'ection. gijarantees that there will be 'o^iy chance <^i?r' 
ferences between the, groups.. ' • " . 

Of , course , ' once in a whilje,,^'.hanc*e -differencjes , between gropps will'^be large. 
For example, ev^n though there are twenty girls and> twenity boys i^n the* group. jfrom 
which lan experimental and a control (of alternat ive- treatment) group are^ chosen. ^ 
an^->tf random proc^ess is followed, by chance a large . proportion^ o f the girls 
might end up t^n'one group and a large proportion of boys in the .other. 



Stratification on a characteristic such as sex can be helpful in en- 
. sur ing' that- it will be properly* rep^sented in both gtoups. That is, 
^the girls and boys can be treated litce- two different groups (i/e. , strat- 
.' ^ified) for purposes of selection, with the random selection process' ap- • 
plied to or^^gr^oup first .and^ then to the 'other. Multi-strata can. be' 
used. For example, prior to^5election for a Energy Unit study, the groups 
of boys and girls could be further stratified acc;ording to whether they s, 
had previously taken a biology course or not. 

■ Matching can be an aid at times,,,. To ensure -that all IQ levels are 

' represented 'in. both groups, a teacher could ranjc all of the students 

according to their TQ sjcores, put intjp pairs ' those with the closest ^rank- 
.ings, and thert randomly assign (perhaps by flipping a ^coin) the members * 
of each paii; to the experimental or contrgl group. One m'ight also want 
to, match on other- variables, such as TiPA. But wlien two or more matching 
. ' -variables are used,, problems often occur because good fitting pairs can- 
^ not be* found. There- is always the student with* an of 14C^and a D- GPA, 
or an 'A GPA and -an IQ of 90, who has^no counterpart. 

► Also,' of course, matching can be done within strata (e.g.,^match 
boys on IQ, and^ then "randomly select the rgroups and , do the same' for the 
girls..) Wlien only one or two stratification*^ variables, and one or two 
matching variables are used, .the .procedure is not too cumbersome; hu^t 
it can easily get out olf hand. ' ( ' 

■ • . If random* selection of students for 'the experimental ^nd control (or 
•aldernativG treatment,) groups' is possible, a major step has been taken toward 
controlling many.of the threats to internal validity, because it can be assumed 
that there- are only chance differences between the groups. , (If stratification 
and/or matching can be used sensibly, all the better.). For example, the threat 
of ^maturat ion is neatlv controlled- by randomization, as is differential selection 
So is statistical regn^ession, if the students with extreme scores are randon^ly 

'selected into* the exper iniental and contral groups. Then the . tendency for the 
person's scores to move tov^ard the mean on the^ second tesC'ing. will be expected 
.to operate' equally (within chance differences) on both groups, and any change ^ 

•for the experimental^ group above and. beyond • that of the control group may.be 
Attributed to^tht treatment — if other threats to internal validit)^- have, be^n 
controlled. The same is true for. testing. 'Any effect of the pretest should 
'be present .for, both the experimental and. control groups, so -^any gain by the ^ 
experimental group over, the control groups 'may be attributed to the treatment — 
if no other threat accounts for it. • 

History and mortalit^-ai^e not controlled -'so neatly. . For example, fhe 
Ibcatioa^of a classroom' may cause a history effect not controlled by randpm*^^ 
select ±oo-^perhap^' ^ positive jeffect if right next- to the school media center; 
a negative one if .right next to >a classroom in v?hich 'another teacher; has trou- 
ble •contr^)lli>ng ^ the 'students . And the means of^selection will, not keep stu- 
dents. from dropping put, espeerially from voluntary programs. '^Soraetiii;ies rate 
of dropouts is relevant dependent variable . ) If the dropout rat^.from an 
experimental or control . group- seems high, it is- a good . idea to check the 
characteristics of those dropping — e.g., pretest, scores, .^^PA, reading scores, 



sex — to see if tWy, differ trom.thc students? who stay and'thus .might have an 
effect on your« results. * ' . 

Of course, while random assigfimeht is a major assist in securing valid 
research- results and therefore, worth emphasizing — teachers often canftot 
rancfomly assign students for their research projects. Nevertheless, ingtancefe 
where it is possible to, do so -should not*be overlooked. For example, in' a teajji 
teaching setting' we were once able to ^Bsigrr students randomly to small grojjp 
sessibT\s in which different dpLscussion^^styles were used. And in studies in- • 
voiving the manipulation of materials' with students .unaware of the diffe'rences 
in materials j. randomization is^ sometimes readily accomplished. For example^ 
'if a teacher, wanted to know whether putting the essay it^s first on a test 
containing essay and cfbjective . items made a difference in student performance 
or student attitude^* toward the test, the test could be made up. in the" two 
formats and 'handed out to students on a random basis^. Such a design ^.s dif- 
ficult to use if students can observe eqch- other ' s materials. The teacher 
needs 'to be ready with an ' explan:ation when Johnny cries 'out, "But Billy has . 
a different test than I do!" ■ ' *• , ' 

Even without random' selection, the use of a comparison group can be ' , 
helpful ifL interpreting your research results".-- With awareness of the poten- 
tial threats, to ^internal validity, use your good judgment to obtain a^ control 
• groiip that is as -similar -as possible- to your treatment group. And, even if 
you cannot ■ select ind ividual students , you may be able to decide randomly — 
say, by the flip of a co/n — which of the two groups will be given the special . 
treatment that is to be e^luated .^Then gather any indications of j^itdal 
group. <lifferenceS — again., sych. as, pretest scores, GPA,-vsex, reading scpres — 
and"take these into account in weighing your results. 

' There are* statistical techniques for adj^usting group posttest means to 
take into accoui?t dnitlnl differences^ such as on the pretest. But teachers ' 
'doing resea-rch will often not know about such techniques or have the facilities 
or -the time to diD the computations. That need not be a serious disacf\^ant;^ge . 
In fact, not relying on statistical analysis can be an advantage in that it 
forces you to examine your data, * You should specify ahead of time how much 
pf .a gain by your treatment group over^ ^our control group would satisfy you 
that the -treatment was^ufficiently effective to be continued. You probably ^ 
would want to anticipate a larger gain ^f the new treatment was costly — in 
money ^or your time — than if not. Then, first, compare the treatment group ' s^ 
pretest scores with its posttest scores ,(to make certain a gain occurred); 
second-, compare the treatment gVoup postte&t scores with those for your control 
group to G^e.termine that the difference reaches your criterion. Then, interpret 
the result carefull;^ ,- taking into, account any potential threats to internal 
validity — especially dif ferentiai.^selection. 

And', whenever possible replicate your study. That is, do it again — on 
different groups, in different senlBsters — to see if th^ same results occur. 
The more times they do, "the more conf icience you^ can feel in the tre*atment. 

• - ' ^ '^ ^ : 

O ther Designs . To ^t^is point, .1 have emphasized the use of two or more 
groups in order to h^e a basis for comparison to determine if - your treatment 



did have an effect. liiere are si ngl e g_r£u_p designs that can be valid and useful 
for the teacher, because they allow tlie demonstration^ of effects. One of these 
designs is very powerful if you are concerned with behaviof that is repetitive 
and can come and go — suclu as disruptive classroom behavior — -rather tha-n learn- 
ings that are more Last ing ' (-i . e; , not readily ' subj ec t to reversal — such, as being 
able to explain the functions of separation of powers;,in our g.overnmental system),. 
This design is often called- the ARA design . With it, one first obtains" an 
.estimate of the behavior to be changed. This might involve counting the number 
of times that students c^re 'out of their seats ^luring several class periods. These 
pretreatmen't data are called the baseline. It is the base fqr comparison.- Next, 
the treatment is introduced (e.g., allowing students to talk to-a buddy for five 
•minutes at the beginning of the next class peisiod if they^tay in their seat 
for a specified period of time) and out-of-seat behayior is counted again. *'If the 
frequency goes down, you may assume the treatment had an effect. To provide 
a furthiBr test, the treatment is ^ removed, a^d the number of times that students 
are put of their seats is counted again. If the out-of-seat instances go up, 
then there is strong evidence that it was the treatment keeping them in their 
seats during the experimental period. There, are, then*, three phas-es in the ABA 
design — the bas^eline phase (the first A), the' t;.reatment phase (the B) ,and the . 
measurement phase following withdrawal of treatment (the second A). 

An alternative design is available in cases where the students might react 
tq withdrawal of the treatment ("How cOme we aren' t getting to talk for stayiRg^^ 
.in Gur seats -like we did ^last week?") or the ^outcome of interest wouldn't be 
expected, to change as the result of withdrawing the treatment' ^(one \/ouldn't 
expect students who learned to e'xplain separation of powers through a special re- 
inforcement program to forget the explanation when the reinforcement was . removed) . 
This design is c-alled multiple-baseline design . , Ag^in, baseline data are col- 
lected, but the treatment is introduced to different students or groups of 
students at different times to see if change occurs with introduction 6f the 
treatment. This design coL^ld te used when, for example, the, teacher had two 
or more classes, all studying the same subject area\ ^ • . 

The above designs are variations of whSt is termed. the time series design . 
In a time series study, 4:he dependent variable is assessed at different points 
in time prior to the treatment.. Then the "treatment is introduced, and' more 
measurements of the dependent variable are obtained.* The series of measure-r 
ments (often the means of the various assessments) is studied to deiLermine if 
there was a change -in pattern following • the treatment. (The nature of the expect- 
ed change should be predicted beforehand as a basis for demcrns trating that the 
treatment had ah anticipated effect.) The study to determine a way to k^ep - 
students irl their seats would fit this design well, perhaps better than the 
ABA design. Counts of out-of-seat behavior could be taken on seyecal consec- 
utive days, the treatment introduced, and counts of instances qC out-of-seat. 
behavior continued for several more days while, the treatment "continued. If * 
out-of-seat behavior went down and stayed down as predicted, following 'the 
introduction of the treatment, this would be powerful evidence for the effec- 
t-iveness of!, the treatment.. Of course, it would be* important t6 check behavior 
for a sufficiently long period of time to ensure that the result was not a 
transient one, g^^^^away, for example, when the newness of the treatment wore 
off. Replic^lL^ I^Bfth other groups is important with time ^^ries studies be- 
cause ^this desi^WTs especially vulnerable to the threats of history.' Could 

• \' ' . 
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something else, such as a stern rci)riman(l and thront of punishment by* the 
pr ipc ipal have causGLl tlto change'^ In st rumcn t dbcay must also be guarded 
against. For example, o^or the pe.riod of -time could the teacher simply 
, have become careless about counting Limes out. of seat? - . 

" ■•■ • ■ \ ' • ' ' • \ • 

Some Comment s. -It is not likely that you^^vTll be 'able to control all 
of the potential threats to internal validity in your classroom research. 
But then, educational researchers can rarely do so in their studies either, 
especially when they are working in applied areas. By being aware of the 
threats, however,^you c:an make some design decisions to help avoid fhem. And 
such awareness can also lielp you in interpreting your findings. Your knowledg 
of your students, your school, and your community will be invaluable as you 
.decide if any of the threats may have contaminated your results. You will 
probably want to be circumspect dn drawing conclusions from your results if 
they have not been replicated; on more than one group and for more than one 
unit or semester. Such replication is important not only f^or bui-lding your 
confidence in whatever treatment effects you have observed, but in deciding, 
if they came out as you wished, how general izable they are. That takes this 
discussion ^6 the other a.^pect of experimental validity — external validity. 

External Validity ' ■ . . " 

The basic question of external v alid ity is. To what persons and ' to what 
circumstances do your results apply?^ The answer to this question requires; 
first of all, a careful, common sense look at the studerjts in your research 
groiip(s).' Are' they like the other students with whom-^you would like to use 
the materials, teaching method, or whatever you are trying out?. (I.e., do 
•they represent the popul ation of interest to you?) Is there anything about 
the students in your treatment group thc|t tnight make the materials, etc. 
work espec ially well or poorly? Or, is your-control group such (e.g., poor- 
ly motivated) that it. makes your -treatment effect appear greater than it ^ 
is?^ An excellent way to answer these questions, aside *from' your oT>m best 
judgment, is to re plicate your study. Repeat it with different groups, 
especially from one school year to the next. 

A critical' aspect of external validit^y is you, the teacher. 'If you 
are not interested in advocating that otTier teachers use your experimental ^ . 
treatment, your problems. of generalization are simplified. You will not. 
have to worry ab'out how representative you -^re of other teachers. But the 
external validity of your results as they apply to your future use of the ■ 
treatment have to do with the way in which you handled the independent 
variable. , You should try to be -certain that you are cojiscious of the way /' 
in 'which you administered th^ treatment. If, for example, you are interested 
in the extent to which different t^peS**t)f homework assignments result, in 



^- ' ^External validity is discussed in Campbell and Stanley (1963). A 
more extencied treatment is available in Bracht and Glass (1968). Also, many 
educational research* textbooks will discu^ both internal and external 
validity in gi^e^er detail than^T coul<l in this paper. 

* , ^Such an instance is reported in Oliver and Shaver (197A, Appendix, Sec- 
tions !^ and 3). , . ► 
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^studt^nts completing tlieir work on tinif, y<ni need to bq certain about the • 
important d imcMis'lons of nssigninenl j;ivLn^», so tliat you can later do ,so In 
Xhc. same way. For lnstan{*.o, were t^lie ass i p.nmen f s p. Ivon orally, in a mimeo- 
graplied handout, or v;ritLen on L lie 1) 1 aekl^oard ; at tlie beginning or the end 
of class , etc . ? Basically, I. he (le;^c r i pt Ion of the independent variable is 
critical so that the effoc-'^ts of<i Lis use can be anticipated validly in future 
classroom use, and/or so ^tliat^ it can be foplicated for further research that 
might be desired. . ' • 

Other threats to external validity l)ave to do more .d irec t ly wi th the ' 
environment ^ou establish', consciously ur not, for your research. For 
e;cample, if you tell your studenjis that they are part of, a piece of research 
you are doing, this couUl lead to j^everal threats to external and internal 
validity. One is the Haw- tliorne E ffect . That is, your studer^ts may behave 
differently because they -know they are part of an experiment. The counter 
to the Hawthorne Kffect is the John Henry Ef £ect — studer\ts who know they 
are in ^1 control ^roup may . work haruer to do well becUuse they are not 
going to be shown up by the exper imental students . T^ere also -is the ex- • 
per i?menter ef f ec t . You may convey your expectations to the students in 
u a way that influences their behavior. Any of these t;hree effects may pro-, 
duce a ahange' in the students that is mistaken for the-effect of the. treat- 
ment (internal validity) . Because these effects are less likely tq. ojj^ur 
in future' use of your experimental treatment, they are threats to ext-ernal 
val idity. • , . . * 

V 

The solution to the Hawthorne and John Henry effects is either to con^ 
ceal from the students that they are part'of a research project or to build 
that impression into future uses^of the treatment. The latter might involve 
trying to capitalize on the Hawthorne effect by becoming known as an in- 
novative, experimental teacher. ^ ^ 

Rei;'Vted to the Hawthorne - ef fee t is the novelty and d isrupt ion ef f gc t , ' 
If your treatment is a new, novel experience for the students, or if it up- 
sets the usual classroom routine, that may affect your results. Vou may 
not be able to generalize to later classes you tsach for whom the treatment 
has become commonplace. • • ' 

You also need to be sensitive to multiple treatment ef ^ect ^ . JT^ese are 
effects produced b.v exposure of students to two or more treaf men'b^ To go 
back to the Energy and * Env ironment Unit: If the teacher^- had just complete 
with the students an experimental unit on "Population and starvation", it 
could b-e that positive results 'that seem due to the Energy unit are the re- 
sult of the combined effects of" the two units. She may be iable to generalize 
only to situations in wh ich students study both units.- In a sense; this be- 
comes a question of selec t ion ' ( i . e . , to what populatdon'*' cam , she generalize? ) — 
or, put differently, of a selection by treatment, interaction . That is, there 
was a combined effect of prior experience and the unit. This sort of inter- 
action might occur for other • reasons — e.g., because the experimental students 
were especially able academically or had other characteristics that made the 
treatment more effective. Replirating a research study with groups of 'stu- 
dents- who have differing characteristics, such a3 you might encounter in" y 
your classes, helps to establish generalizab ility . You might a^L^so want to 
^lo()k at the results for different subgroups within the experimental class(es) 
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to se« if t;here was ,an intferaction effect. For example, did boys and girls- 
learn equally well y.ith Xh^ Energy Unit? . (Boys- concerns with .cars might 
make' them more interested iiApotential fuel scarcities', 'for example.) ^' 

• , . ■ ' • ^ - . v' ' • 

* 5\ Similarly , his^Cory and treatment may interact . That *i^, the? Energy Unit 
'might^fbe- effect^-Me only because of current^ m^di*^ atTtent ion ' to an ^energy crisis. 
TWe^'same ef f ed^ iveness^'could not be expected *withou-t such media /'assistance. *' 

*' ' Testing i)&-%lso very'impor tant . to>*external validity, as it is to internal 
•yali^ity^. Ojije- aspect of resting and generalizabi.lity is the need *to be care- 
ful about, expecting t^hc same results wltfh a test' or tests different from the 
tes'ti^^or tests used as dependent variables. Just lf^aus.e> your students did • 
Veil, for Instance, on one test of Critical thinking doesn't neo^ssarily mean 
they will dcKwelX on -another. Al^b, testing mfey , interact with the treatment. 

.Taking a pretest may sensitize students to the content of an Energy AJnit so 
that they learn more- than if they had not been pretested^ This ie^a poten- 
•tial threat, only if a pretest is nat always given with the unit. The post- '", 
t^st may also provide a ^"learning" .effect, - but this is rarely a problem in 

'classroom research as jesting 'following a set , of JLearning activities is 
cornmon.- ThQ time' ^-of testing may, hdweve'r be important* How well students 
do ^ri a test may depend on whether it* s given right at the end of a unit or 
two Weeks' or six months later. - . ' " 

Most of these threats to external validity can be minimized. Common " 
Sense solutions involve sucK things as not letting the students kno^ they 
.are part of a rese^arch project (this can raise ethical . problems^ if the con- 
'tent is .e:cp%r imental and possibly ob^ctionable to some parents, V^r if par- ^ 
ticipation in ^our . proj ec t %igh t keep students ftom learning thin\s expected 
of them by parents, other teachers, or .school district requirements) or^ if 
it is not possible -to disguise the use of new materials, by also using "new 
appearing"' materials in your control group (s) if any are used. Again, rep- 
lication is vital to determining- if you will obtain the same results with 
groups of students with different characteristics, but similar to those 
Students you might teach, or at different points in time or after continued 
Use. ' As mentioned above, looking at subgroups of students can also be help- 
ful (but be careful of the regression effect) in determining how generaliz- 
able your results are.^ : 

" Statistics 

Do you need to know statistics to do valid classroom research? No^ 
you do ncTt. It may be helpful to be able to compute some simple descrip- 
tive statistics: measures of the central tendency of scores for your group — 



Such as the nean (the arithmetic average), the me'^an (the point above and 
below which titty percent of the' scores fall), o^ the ir.nde (the most fre- 
quently occurring score) — or of*" the^jj^spersiop^r spread of scores — such 
as the . range (the highest minus the lowest^^'^core plus 1) and* the standard 
deviation (the mean squared deviationa-bmit the mean — a somewhat more com- 
pTicated stat is t ic /^^scr ibed in^vefy elementary educational "statistics 
book); 

Measures of dispersion are particularly important in determining if 
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your 'treatment is effective for all students. You may find that whether 
the central t endency of 'the scores changes or not., the dispersion has, because 
some students do particularly well with your treatment and/or others do partic- 
ularly poorly with, it\ ^ . ' ' 

If you do us'e measures of central tendency and dispersion, be wary that 
ypu do no-t .depend on them too heavily. -Reliance rou^ descriptive statistics 
can obscure .many '.interesting insights into what has happened to yqur class 
ci3 a result'of the treatment. Still 'inspect your/data (for example, examin- 
ing individual* tests) rand ase your Veaitft of knowledge about your students, ¥ 
your . school, and .your teaching to interpret tlie results > You may. find, yburself 
talking to stu'dents t.o -discover answers to questions raised by your inspec- 
^tion- of the data. (E, g. , Why d id the girls, ^o better or mor'e^.p6orly on an 
Energy Unit?) This is an important data-gathering techniique—one that ed- 
ucational researchers. of ten feel uncomfortable using because* they have- been 
educated to be concerned about •maintaining a formal design and> data-collecting 
techniques. 

Well, how about inferential statistics, such'as analysis-.of variance? 
These are of little use for classroom research such as discussed in this paper. 
Techniques such, as analysis of variance are used to.^ determine whether your 
results might have oyfccurred by chance if your group* had been drawn randomly 
from the same population. Not only will teachers doi^g classroom research 
rarely have the chance to' Select their groups randomly, but they are not 
Likely to be intere.^ted in generaii2;ing to broad populations as educational < ^ 
researchers, are (but who, .alas'l , often also lack randomly selected groups^ . 
A better bet for the teacher is^ to specify ahead ^ of time what changes wiM 
be educationally meaningful (e.g.. How many more of my students must hand in 
thtslr homework befor^ I adopt the new method of giving homework a'ssignments?) 
and then check your results against that criterion. Educational significance 
is much mote important in the classroom setting than statistical significance. 
And using replication to establish that the Results can be attained again is 
more powerful than statistical analysis, too. ^ ^ \ ' ^ 



I ■ If you do know about iaf twrent ial statistics, especially nonparametric 
''ones (ones that make no as&.Gmptions about the populations from which your • - 
j>roups are drawn) such as chi-square, don't hesitate to use them. Ygu may 
want to ask, for example, how likely it is that a particular distribution of. 
scores could have occurred by chance.' Or analysis of covariance can be of, 
some assistance by making statistical adjustments for initial difference ^ , 
between treatment and control groups. But don't become oVer-reliant on in- 
ferential statistics so that questions of educa^nal significance are over- 
looked or so that you don't trust your own insign^s into what happened. 

Remember, too, that the inferential 'statistics model is basically a yes- 
no, decision-making one— i.e,. Can the result be accepted as non-chance or . 
not? Teachers will , probably more often be doing research from a developmental 
model. They will o'ften be asking questions such as,' "How can I improve this' 
unit?", not "Should I teach this unit at all?" Inferential statistics are not 
much help with the former type of question. 
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Conclusion - 

Using 'your ^wn intellectual resources to examine your data,^to c(3iate^ 
-plate what went on . during 'the treatment, and "to^ interpret the^results, and 
using informal ways of determining how and why your students reaqted as they 
did, are critical'. The threats to experimental validity discussed inAhis 
'paper may help you to be aware of possible errors and take them inta/^ccount 
iO^drawing conclusions about a treatment's effectiveness and the ex^<4nt to 
which it is' generalizabld to other classes you will teach. The d^^cussion 
of threats ia'meant, as is the discussion of, designs, as an aid to teachers 
concerneid with -making sougd judgments about . the 'cur ricular and instructional 
issues "that concern tt 
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